Skip to content

Reorg and new custom decoder support #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Jul 5, 2025
Merged

Reorg and new custom decoder support #177

merged 45 commits into from
Jul 5, 2025

Conversation

oschwald
Copy link
Owner

  • Move decoder code into new internal package
  • Add change log
  • Make functional option functions return a function
  • Add support for options to Open and FromBytes
  • Add missing docs
  • Tighten up the golangci-lint config
  • Start decoupling the decoding and relection
  • Move size checks to DataDecoder
  • Improve naming of data types and make them public
  • Add separate Decoder type for manual decoding
  • Add support for UnmarshalMaxMinDB

@oschwald oschwald requested a review from Copilot June 22, 2025 21:46
Copilot

This comment was marked as outdated.

oschwald added 15 commits June 28, 2025 16:20
This will be used in the future for things like cache configuration.
Adding now as it is a breaking change.
Although this doesn't provide immediate benefit, the intent is to make
the data decoder available separately in a future commit for public use.
This further decouples the code. There are probably future improvements
for reducing redundancy however.
Extend the UnmarshalMaxMindDB interface to work recursively with nested
types, matching the behavior of encoding/json's UnmarshalJSON. Custom
unmarshalers are now called for:

- Struct fields that implement Unmarshaler
- Pointer fields (creates value if nil, then checks for Unmarshaler)
- Slice elements that implement Unmarshaler
- Map values that implement Unmarshaler

This enhancement allows for more flexible custom decoding strategies in
complex data structures, improving performance optimization opportunities
for nested types.
Add PeekType() method to Decoder that returns the type of the current value
without consuming it, similar to jsontext.Decoder.PeekKind(). This enables
look-ahead parsing for conditional decoding logic.

The method follows pointers to return the actual data type being pointed to
rather than just returning TypePointer.
@oschwald oschwald force-pushed the greg/decoder branch 3 times, most recently from 652ee9d to 9e78231 Compare June 30, 2025 02:49
@oschwald oschwald requested a review from Copilot June 30, 2025 02:52
Copilot

This comment was marked as outdated.

oschwald added 4 commits July 1, 2025 19:50
The Decoder and Unmarshaler types are now available in the mmdbdata
package for applications that need direct access to the decoding API.
Renames Type* constants to Kind* and PeekType() to PeekKind()
throughout the codebase to match Go's encoding/json/v2 naming
conventions. This improves API consistency with the standard
library.
Renames all Decoder methods from Decode* to Read* (e.g.,
DecodeString to ReadString) to match Go's encoding/json/v2
jsontext.Decoder naming conventions. This improves API
consistency with the standard library.
Adds DecoderOption type and variadic options parameter to enable future
configuration without breaking API changes. Follows existing library
patterns like ReaderOption and NetworksOption.
oschwald added 15 commits July 1, 2025 19:50
Improved error messages to include byte offset information and, for the
reflection-based API, path information for nested structures using JSON
Pointer format. For example, errors may now show "at offset 1234, path
/city/names/en" or "at offset 1234, path /list/0/name" instead of just the
underlying error message.

The implementation maintains zero allocation on the happy path through
retroactive path building during error unwinding.
Replaces unbounded cache with fixed 512-entry array using offset-based
indexing. Provides 15% performance improvement while preventing memory
growth and ensuring thread safety for concurrent reader usage.
Move size validation logic to DataDecoder to eliminate duplication
and create single source of truth for data validation.
And make them more accurate.
- Add comprehensive field documentation to Metadata struct with
  references to MaxMind DB specification
- Add BuildTime() convenience method to convert BuildEpoch to time.Time
- Enhance Verify() documentation explaining validation scope and use cases
- Add ExampleReader_Verify showing verification and metadata access
- Add TestMetadataBuildTime to verify BuildTime() method correctness
Add makeTestName helper function to create reasonable test names from
long hex strings. TestDecodeByte and TestDecodeString now show concise
names like '9e06b37878787878...7878' instead of extremely long hex
strings that made test output unreadable.
Remove the experimental deserializer interface and all supporting code:
- Delete deserializer.go interface definition
- Delete deserializer_test.go test file
- Remove deserializer support from reflection.go
- Remove deserializer methods from data_decoder.go
- Remove unused math/big import from data_decoder.go
- Add breaking change notice to changelog recommending UnmarshalMaxMindDB
Error messages for type mismatches now display readable type names like
'Map' and 'Slice' instead of numeric codes, making debugging easier.
Make StringCache, buffer access, InternAt, and all DataDecoder methods
package-private since they are only used within the decoder package.
Keeps Kind methods public as they are exposed via mmdbdata type alias.
Replaces global mutex with per-entry mutexes to reduce allocation
count from 33 to 10 per operation in downstream libraries while
maintaining thread safety and good concurrent performance.
Add BenchmarkCityLookupConcurrent to demonstrate string cache performance
improvements under concurrent load. Tests 1, 4, 16, and 64 goroutines
performing realistic city lookups, providing clear metrics for concurrent
scaling behavior.
oschwald added 8 commits July 4, 2025 11:04
Use direct type assertion instead of reflection-based interface
checking in Decode method for better performance.
Replace nodeReader interface with specialized traverseTree functions
for each record size. Eliminates interface dispatch overhead and
implements branchless offset calculations for improved performance.
Replace reflect.Type.Implements() with type assertion using comma-ok
idiom and remove redundant pointer interface check. The recursive
decode call handles pointer receiver implementations via CanAddr().

Eliminates unmarshalerType variable and reduces code complexity while
maintaining identical functionality and performance.
Remove inaccurate 15% performance improvement claim that was
contradicted by benchmark testing. Add missing BREAKING CHANGE
label for network options API changes.
This commit implements encoding/json/v2 style field precedence rules
for struct field resolution, replacing the previous two-phase
processing with a single-phase approach that properly handles field
conflicts using depth-based and tag-based precedence.

Key changes:
- Replace fieldsType.anonymousFields with fieldInfo metadata structure
- Implement breadth-first traversal for field collection with depth tracking
- Add support for embedded pointer types (*EmbeddedStruct)
- Apply json/v2 precedence rules: shallow beats deep, tagged beats untagged
- Use single-phase processing with FieldByIndex for embedded field access
- Initialize nil embedded pointers during field traversal

Precedence rules applied:
1. Shallowest embedding depth wins
2. Among same depth, explicitly tagged field wins over untagged
3. Among same depth and tag status, first declared wins

Fixes embedded pointer field access that was causing nil pointer
dereferences in complex nested structures.
Adds basic validation for maxminddb struct tags inspired by
encoding/json/v2's tag validation approach. Currently validates:

- UTF-8 encoding of tag values
- Provides foundation for future tag validation improvements

The validation is designed to be non-intrusive - validation errors
are currently ignored to maintain backward compatibility, but the
infrastructure is in place for future enhancements.

This follows the json/v2 pattern of catching obvious user mistakes
while being permissive about edge cases that might be legitimate.
Implement field index reindexing and addressable value wrapper to
reduce reflection overhead. Split field indices for faster access
and eliminate redundant bounds checks during field traversal.

Based on encoding/json/v2 optimizations for better performance in
struct field access patterns.
Optimizes the reflection-based decoder with more efficient
field access patterns and reduced memory allocations.
Eliminates duplicate code paths and unused functions.

Performance improvement measured in geoip2-golang benchmarks.
@oschwald oschwald requested a review from Copilot July 5, 2025 20:46
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reorganizes the decoder implementation into internal packages, adds a new custom decoding API, and tightens error handling and configuration.

  • Moved all decoding logic into internal/decoder, exposing a high-level decoder.ReflectionDecoder and a public mmdbdata.Decoder factory.
  • Replaced legacy error functions with the internal/mmdberrors package for structured, contextual errors.
  • Refactored options to use functional option patterns and updated APIs (Reader.Open/FromBytes, Networks, Verify, Result.Decode), plus added missing docs, tests, and benchmarks.

Reviewed Changes

Copilot reviewed 37 out of 38 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
verifier.go Switched to use ReflectionDecoder.VerifyDataSection and mmdberrors for errors.
traverse_test.go Updated test to call network options as functions (IncludeAliasedNetworks()).
traverse.go Refactored NetworksOption funcs to return closures and tree traversal adjustments.
result.go Simplified Decode/DecodePath to delegate to decoder.ReflectionDecoder.
reader_test.go Fixed literal types, updated closed-DB error messages, added concurrent benchmark.
reader.go Overhauled Reader to use ReflectionDecoder, introduced ReaderOption, enriched docs.
node.go Removed – replaced by readNodeBySize in traverse.go.
mmdbdata/type.go Added public type and factory aliases for custom decoding.
mmdbdata/interface.go Introduced Unmarshaler interface for custom unmarshaling.
mmdbdata/doc.go Added package documentation for mmdbdata.
internal/mmdberrors/errors.go Defined InvalidDatabaseError and UnmarshalTypeError types.
internal/mmdberrors/context.go Added ContextualError, WrapWithContext, and PathBuilder.
internal/decoder/verifier.go Moved verifyDataSection implementation into ReflectionDecoder.
internal/decoder/* Moved core decoding, reflection logic, caching, and extensive tests under internal/decoder.
Comments suppressed due to low confidence (5)

reader.go:107

  • The code uses errors.New but the "errors" package is not imported. Add "errors" to the import block.
import (

reader.go:331

  • This call to errors.New will not compile because the errors package isn’t imported. Ensure errors is imported.
		return Result{err: errors.New("cannot call Lookup on a closed database")}

reader.go:362

  • This call to errors.New also requires importing the errors package. Add the import to fix the compile error.
		return Result{err: errors.New("cannot call LookupOffset on a closed database")}

reader_test.go:1033

  • You can’t range over an integer. Replace with a standard for-loop, e.g., for i := 0; i < numGoroutines; i++ {.
				for range numGoroutines {

reader_test.go:1042

  • Likewise, lookupsPerGoroutine is an integer, not a slice. Use for i := 0; i < lookupsPerGoroutine; i++ { instead.
						for range lookupsPerGoroutine {

type Result struct {
ip netip.Addr
err error
decoder decoder
decoder decoder.ReflectionDecoder
Copy link
Preview

Copilot AI Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The field name decoder shadows the imported package name decoder. Consider renaming the field (e.g., refDecoder) to improve clarity.

Suggested change
decoder decoder.ReflectionDecoder
refDecoder decoder.ReflectionDecoder

Copilot uses AI. Check for mistakes.

Update ReadMap and ReadSlice to return collection size along with
iterators, enabling efficient pre-allocation of maps and slices.
Iterator remains the primary return value for natural usage patterns.
@oschwald oschwald merged commit 0719621 into main Jul 5, 2025
16 checks passed
@oschwald oschwald deleted the greg/decoder branch July 5, 2025 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant